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THE LOG-PROBABILITY LAW AND ITS ENGINEERING APPLICATIONS 


Ven Te Chow,' A.M. ASCE 


SYNOPSIS 


It is generally recognized that the log-probability law is applicable to the 
study of the statistical distribution of various engineering data. From an engi- 
neering viewpoint, this paper is prepared to present a theoretical treatment of 
the law and its applications which includes the following salient features: 

1) A comprehensive but brief survey of early studies and practical appli- 
cations of the log-probability law with a generous citation of references. 

2) A theoretical interpretation of the law and the derivation of equations 
for its characteristic values. 

3) Computation of the frequency factor and a revision of Hazen’s table of 
logarithmic skew curve factors. 

4) Verification of the fact that the extreme-valve law is a special case of 
the log-probability law. 

5) Suggestion of a method of straight-line fitting of engine2ring cata with 
a variable coefficient of skew. 


INTRODUCTION 


It has long been noted by engineers that statistical data of many kinds form 
an extremely skew frequency distribution but their logarithms are nearly nor- 
mally distributed. The distribution of such data is known as the logarithmico- 
normal distribution(1)? bec characteristics can be explained theoretically 
by the central limit theorem{1)(2) and defined mathematically by the log-proba- 
bility law‘), 

The log-probability law has been widely recognized among engineers as a 
convenient tool for recording, fitting, and frequericy analysis of various engi- 
neering data, because its practical use is quite simple and results obtained are 
very satisfactory. However, the mathematical manipulation of the law is rather 
complicated. The theory which supports its practical applications has never 
been satisfactorily explained in engineering literature. Consequently, the law 
is not well understood by most engineers, and in fact, it often becomes more or 
less mysterious to many of its users. 

The objective of this paper is to present certain mathematical features of 
the log-probability law and to clarify certain aspects which are significant and 
useful to its engineering applications, particularly in regard to hydrologic prob- 
lems. No attempt is made to include procedures of the application of the law to 
practical problems that have been well-established in engineering practice and 
literature. However, a new approach to the use of the law is suggested. 


1. Research Asst. Prof. of Civ. Eng. and Member of the Teaching Faculty of 
the Graduate College, Univ. of Illinois, Urbana, III. 

2. Numerals in parentheses, thus: (1), refer to corresponding items in the 
List of References (see Appendix I). 
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Early Studies and Applications 


The log-probability law is also known as the law of Galton, because it was 
first studied by Galton as early as 1875(4)(5), Further mathematical studies 
of this law were made by McAllister(6), Kapteyn(7), Gumbel(8)(9), Fisher(10), 
Jenkins(11), yuan(12), Cramér(1), Johnson(13), Cohen(14), Moshman(15), and 
many others; but the treatments are not sufficient to satisfy the purpose for 
engineering applications. It is surprising to note that most textbooks of sta- 
tistics and probability theory do not cover this law; whenever some of them do, 
the treatment is usually very limited in extent. 

As mentioned above, the practical use of the law is quite simple. Thus, 
the law has been adopted by many authors in various diversified fields, such 
as astronomy(16), agricultural science(17), piology(18)(19)(20), economic sta- 
tistics(21)(22), me 
and engineering. In a review of the literature Gaddum(27) found the possibility 
of many other interesting applications.® 

In the field of engineering, applications of the law have been found very use- 
ful in mechanism, mining, structural mechanics, hydraulics, and particularly 
in hydrology. Epstein(2 derived the logarithmico-normal distribution as the 
asymptotic distribution of particle sizes resulting from breakage processes. 
Krige 29 applied the log-probability law to the distribution of gold values in 
mines. Johnson(30) claimed that the distribution of the strength in the elemen- 
tary volumes of various materials, particularly the perfectly plastic materials, 
would be approximately logarithmico-normal. Hatch and Choate(?1) found that 
sand sizes will very closely follow the normal law if the logarithms of the di- 
ameter of sand grains are used instead of the actual values themselves. 
Blench'32) confirmed that this is also true for river-bed sand samples. The 
applicability of the log-probability law to 8 yc of sizes of small par- 
ticles was also discussed by Kilmogorof 3 , Halmos 34 , and Kottler(35), 
Horton(36) claimed that between 1900 and 1908 he had tried the use of a special 
log-probability curve for the determination of maximum magnitude and stage 
of flood, capacity of waste-weirs, etc., in connection with the design work of 
the New York Barge Canal, and that similar methods were also used to deter- 
mine the law of frequency of occurrence of various other hydrologic events, 
but he failed to specify clearly that the curve he used represents a logarith- 
mico-normal distribution. Allen Hazen‘37) is believed to be the first one who 
remarked that if the logarithms representing the several floods are used in- 
stead of the numbers themselves, the agreement with the normal law of error 
is closer. He proposed the use of a log-probability paper (38) and developed 
a plotting procedure(39) for flood frequency analysis. Other investigators have 
made we studies and arrived at satisfactory results. Among them are 
Gibrat'4°) anda Grassberger(41) who made use of the plotting for the study of 
waterpower potentialities. Iwai(42) found that many methods for estimating 
duration curves are based upon the theory of logarithmico-normal distribution. 
He applied the theo Japanese hydrologic data and obtained remarkable re- 
sults. Lane and Lei‘\43) used log-probability plotting for duration curves of 
daily streamflow. Mitchell(44) applied the logarithmic-probability graph to 


3. Gaddum found that the law could be employed to describe: the threshold of 
sensation; the size of silver particles in a photographic emulsion; the sénsi- 
tivity to drugs; the survival time of insects treated with disinfectants; the 
average size of the different species in each of various phylogenetic groups; 
the number of plankton caught in different hauls of a net; and the amount of 
electricity used in medium-class American homes. 
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dical science(23), political economics(24)(25), psychology(26), 


study the discharges of Illinois streams. Chow(45) used similar plotting in de- 
veloping the Feqasney, factor in a general formula for analyzing hydrologic 
data. Beard(46)(47)(48) and the Corps of Engineers(49) have made extensive 
use of logarithmic plotting for flood peak discharges. Foster(50)(51) used the 
method of arithmetic integration for computing characteristics of the law. 
Yevdjevié(52) applied the log-probability plotting to discharge data of the Drina, 
Sava, and Danube rivers in Yugoslavia. In an investigation of rainfall intensity- 
frequency data Mclllwraith(53) used the logarithmico-normal distribution for 
maximum yearly rainfall and found favorable answers. 


Theoretical Interpretation of the Law 


The log-probability law can be interpreted by the central limit theorem 
which was first stated by Laplace 54) in the mathematical theory of probability. 
The theorem has been modified and extended in various ways. For the present 
purpose it may be stated as follows: 

Whatever be the distribution of the independent variates X,, subject to cer- 
tain very general conditions, the sum X = X, + X, +... + X, approaches, as r 
increases, the normal distribution whose mean and standard deviation are re- 
spectively equal to the sum of the means and the sum of the standard deviations 
of the variates. 

This theorem may be extended to various cases when the variates in the 
sum are either discrete or continuous, dependent or independent. 

Take the hydrologic data for example. It may be considered that the occur- 
rence of a hydrologic event of certain magnitude x is a result of the joint action 
of many causative meteorological and geographical factors(55). It can be ex- 
pressed mathematically that the magnitude x is equal to the product of r inde- 
pendent magnitudes, x,, x,, ..., X,, which are respectively due to the r causa- 
tive factors, or 


where r is very large. By taking the natural logarithms of both sides of the 
above equation, the following is obtained 


r=r 
log,x = z 1og,*, SES SEH (2) 
re] 


This equation states that the logarithm of x is equal to the sum of a very large 
number of independent variates. Consequently, it follows from the central 
limit theorem that log,x is normally distributed, and hence that x is logarith- 
mico-normally distributed. 

Let y = log,x, then y is normally distributed and the frequency function of 


(y~y)? 


in which y = log,x and oy = 9 


OGex are respectively the mean and standard 
deviation of y and e is the base “of natural logarithms. From the standard 
transformation formula of statistical mathematics, or 
= ey) we (4) 
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in which ay = =i SESE SSE (5) 
dx x 


the frequency function of x is 


(log,x 


2 log,x (6) 


(y-7)" 
2 


(7) 


d(x) = 


Jam 


Equation 6 represents the frequency function of the log-probability law. It 
should be noted that the range of y in normal distribution is from -« to oo, 
while the corresponding range of x in logarithmico-normal distribution is 
from 0 to oo. 


Characteristic Values 


The characteristic values of the log-probability law may be determined by 
the method of moments. The r-th moment for the logarithmico-normal distri- 
bution about the origin x = 0 takes the form: 


Transforming the variate x into y by means of Eqs. 5 and 7, and simplifying, 
the following is obtained 
2 
s Ja ° ds 
2 
-3 
z=(y- roy - y)/ and dz = Since dz=1, 


where 


the r-th moment is 
2 
- 2 
Assigning the value of r = 0,1, 2, and 3 for the first four moments and 
computing the mean, x, the standard deviation, o,, and the skewness, a_, of 
x x 
x by standard formulas, we obtain: 


4. The author wishes to point out that a similar formula for M, has been de- 
veloped by him and published elsewhere 56). There is a discrepancy be- 
tween that formula and Eq. 10 due to an error involved in the procedure of 
derivation of the former. Consequently, the formulas for x, Ox, and a, pub- 
lished earlier‘5§) are in error and should be replaced respectively by Eqs. 
11, 12, and 13 of this paper. 
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2 
y + (3/2) 


The coefficient of variation is obtained from Eqs. 11 and 12 as 


2 


x 
The coefficient of skew is obtained from Eqs. 12 and 13 as 


2 2 
Cc 


(e oF 1)?/2 


Eliminating oy from Eqs. 14 and 15, the relation between C, and C, is ex- 
pressed as 


v v 


Based on a graphical procedure, Foster(51) obtained an approximate relation 
that C, = 3C,. When C, is less than 0.398, the error in C, obtained by this 
approximation is within 5 per.ent. 

It should be noted that when y is normally distributed, its mean Y is equal 
to its median M,, or 


= % 


which is the value of y at a probability of “50%-of-time”. Let M be the median 
of x, then 
My = log,M 


From Eqs. 17 and 18, 


= log, M 


or 


The ratio of the mean to median for x may be obtained from Eqs. 11 and 20, 
= a 3/2 
Equations 14, 15, and 21 indicate that the characteristic values of the log- 
probability law, x/M, Cy, and C,, are functions of 9, only. For given values 
of Oy, these values can be plotted as shown in Fig. t. Similar plottings have 
been prepared by Langbein‘57) in which the characteristic values are plotted 
against g,, which is equal to e Y. His curves for x/M and C, are based on 
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2 
and 
Ax 
= 


similar equations presented by Kalinske(3), but his curve for C, is based on 
Hazen’s approximate values® which are shown later to be different in certain 
ranges from the theoretical values. 

By means of arithmetic integration, Foster has constructed similar curves 
by which the characteristic values are p etted against the “index of variability,” 
I, ranging from : to 0.25 for all values‘5!) and also from 0 to 1.00 for the re- 
ciprocal of “ The term “index of variability” was first proposed by 
Lane and Lei'43) and it is defined as the standard deviation of the logarithms 
of the values of discharges at 10% intervals from 5% to 95% of the time read 
off from a duration curve. Chow(58) shows that this index is theoretically 
equivalent to the standard deviation o,,. In the practical procedure, the value 
of I, is obtained usually from the plotting of a frequency curve on a common- 
logarithmic probability paper in which the common logarithm is employed. 
While the value of o,, is based on the natural logarithm, it may be converted 
into the value of I, by 


= (1og,10)1, = 2.30261, ec (san) 


or 
= (10g, = 0.4343 Jy 


With this conversion, a check against Foster’s curves indicates a good agree- 
ment between Foster’s values and the theoretical values. 

Equation 3 states that y is normally distributed. In other words, values of 
y will plot as a straight line on a normal probability paper, or values of x will 
plot as a straight line on a logarithmic probability paper. The distribution 
function corresponding to Eq. 3 and representing the probability or percentage 
of time at which the magnitude y is exceeded is therefore, 


Gay) 


From Eq. 4, it is evident that the probability F(y) for y is equal to the prob- 
ability of x,or @(x). It may therefore be written that F(y) = (x) = P 

The following is to prove a graphical method for the construction of a flow 
duration curve. From any standard normal probability function table, it may 
be found that the argument is equal to unity when P = 15.87%. Taking y = y’ 
and x = x’ at this probability, the argument is 


y'-7 
Ty 


= 1 


or Ty = y' = 
Since y’= log,x’ and, by Eq. 19, ¥ = log,M, Eq. 24 becomes 


= log,x' - log,M = log, (x'/M) 


5. Confirmed by a letter from Mr. W. B. Langbein dated November 27, 1953, in 
which he stated: ‘*... Kalinske did not give any formula for C,. I didn’t 
derive any equation, I merely used Hazen’s Table II of his book on Flood 
Flows.” 
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When the common-logarithmic probability paper is used for plotting, then 


= 10g, 9(x'/™) 


x'/M = antilog, 
This equation proves the graphical method for the construction of a duration 
curve as suggested by Lane(°9) which was stated as: 


“The shape of the duration curve can be obtained by drawing a straight 
line duration curve on logarithmic probability paper with a slope such 
that the ratio of the discharge exceeded 15.87% of the time to the discharge 
exceeded 50% of the time is equal to the antilogarithm of the variability 
index selected.” 


Frequency Factor 


Frequency factor is another characteristic value of the log-probability law. 
As this value is more significant in engineering than in its mathematical impli- 
cation, more discussion is necessary, and therefore it is treated separately 
under its own heading. This term, known as the reduced variate in mathema- 
tics, was used by Chow(45)(60) in connection with a general formula for hydro- 
logic frequency analysis which is expressed as 


x= x K 


in which K is the frequency factor which, as defined, depends upon the law of 
occurrence of a particular hydrologic event under consideration. Comparing 
the use of Eq. 28 with Hazen’s method 39)° of log-probability plotting, it can be 
easily seen that the frequency factor for log-probability law is theoretically 
identical with Hazen’s so-called logarithmic skew curve fa‘tor. Substituting 
expressions of x = e’ (since y = log,x as defined previously), x from Eq. 11 
and 0, from Eq. 12 in Eq. 28, and solving for K, the following is obtained 


K, 
e -1 


in which 


= 


This equation may be written as 


By the definition of frequency factor or by comparing Eq. 31 with Eq. 28, it is 
evident that > is the frequency factor of y which is normally distributed. Sub- 
0 


stituting Eq. in Eq. 23, the normal distribution function or probability of y 
is 
1 


Ps = 


Ja 6 ak, 


6. See Chap. VII, pp. 48-59 of the reference(39), 
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where Ky is the argument. Therefore, for any given probability P, the value 
of may be obtained from a normal probability function table. With known 
values of Ky and g., the frequency factor K may be computed by Eq. 29. 

In 1921, Hazen” published a table for values of K, and in the meantime, he 
made a remark which reads: 


“The factors in (the table) were reached by an approximate procedure 
that will not be described. They are used as a means of drawing smooth 
curves and do not replace, but are used as aid to, the purely graphical 
procedure. If some member of the Society who enjoys mathematical 
problems, would work out factors corresponding exactly with theoretical 
skew curves, it would be appreciated.” 


This table, however, was revised later in 1930 oy graphical procedure using an 
artificially prepared series of 25 terms each(39)°_ For the sake of reference, 
Hazen’s revised table is reproduced here as Table 1.° Since its publication, 
this table has been widely used and referred to in many engineering offices, 
such as the U. S. Bureau of Reclamation(61), However, it should be noted that 
Hazen’s method has not been accepted by most statisticians, mainly because of 
its empirical nature and the inaccuracy of the table. In Hazen’s table, rows of 
values of K for assigned values of C, and probabilities are listed. The corres- 
ponding exact values of the theoretical frequency factor may be computed by 
the theory developed in this paper as follows: 

First the value of 0, for a given value of C, can be computed from Eq. 15. 
Equation 15 may be simplified to the form of a cubic equation, 


20? 
7 + 3e (4 + @) = 0 


Solving the real root of this equation, the value of g, in terms of Cc, is ex- 
pressed as y 


From the given probability P, the value of may be found from a normal 
probability function table due to the relation expressed by Eq. 32. Substituting 
in Eq. 29 the values of o,, and thus found, the theoretical frequency factor 
can be computed. In order to challenge the encouraging words stated by Hazen 
in 1921, the complete set of theoretical values comparable to Hazen’s table 
were computed and are listed in Table 2."° The coefficients of variation were 


See p. 219 of the reference(38), 

See p. 49 and Table II on p. 188 of the reference(39), 

This table is reproduced by permission from “Flood Flows” by Allen Hazen, 
published by John Wiley and Sons, Inc., 1930. The permission was granted 
by the publisher in a letter dated January 5, 1954. 

It should be noted that in the case of C, = 0, Eq. 34 gives 0, = 0, and Eq. 29 
becomes indeterminate. In order to solve this puzzle, the ¢xponential forms 
of “e” in Eq. 29 should be expanded into infinite series as follows: 


1 1 
1¢d, - + -= Oy * oo 


/ 
1+ OF + 1 +5 Ty 
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computed by Eq. 14. The probabilities at mean are those which occur when the 
frequency factor is equal to zero, as indicated by Eq. 28. From Eq. 29, with 

K = 0, it is found that a =, and the corresponding probabilities at mean 
were found from a normal probability function table. A comparison between 
Hazen’s values and the theoretical values of frequency factor indicates that 
discrepancy exists for high coefficients of skew, at which Hazen’s factors are 
higher. Accordingly, for higher coefficients of skew, Hazen’s probability at 
mean becomes lower and the corresponding coefficient of variation is higher. 


Relation with the Extreme-Value Law 


The law for the distribution of extreme values is based on the theory devel- 
oped by Fisher and Tippett in 1928(62), This theory has been applied to various 
fields of science and engineering. Its first application to the frequency distri- 
bution of hydrologic data was made by Gumbel in 1941(63), In principle, it is 
stated that annual maximum values of certain years of record approach a pat- 
tern of frequency distribution which can be defined by the extreme-value law 
when the number of observations in each year becomes large. The frequency 
factor for this law has been expressed by Chow(45)(55) as follows: 


K e log, [ 10g,7 - } 


in which T = 0.5772157 . . . a so-called Euler’s constant, and T is the recur- 
rence interval for the annual maximum values and is related to the probability 


P by 


For various probabilities, P equal to or greater than the given variaie, the cor- 
responding frequency factors for the extreme-valu2 law can be computed by 
Eqs. 35 and 36, and the resulting values are shown in Table 3. For example, 
with a probability of P = 99% equal to or greaier than the given variate, Eq. 36 
gives T = 1.010 yr. Substituting this value of T in Eq. 35, the value of K =1.641. 

The theory of extreme values postulates 1.139 for the coefficient of skew(64), 
For this coefficient, corresponding frequency factors of the log-probability law 
were computed by the procedure described in the previous article and are given 
in Table 3. For example, with C, = 1.139, Eq. 34 gives o, = 0.3525. Witha 
probability of P = 99% equal to or greater than the given variate, the value of 

in Eq. 32 obtained from a normal probability function table is equal to 
- 2.327. Substituting 0, = 0.3525 and Ky = - 2.327 in Eq. 29, the value of K = 
- 1.610. 

A comparison between values of frequency factor obtained by the two laws 
as shown in Table 3 indicates a maximum difference of less than 1.8% for the 
given probabilities. It may therefore be concluded that for practical purposes 
the extreme-value law is but a special case of the log-probability law. With 
Cs = 1.139 the value of Cy computed by Eq. 16 for the log-probability law is 


10. (cont.) Dropping terms containing o,, then K = Ky. After the completion of 
this table the author discovered that the frequency factor is theoretically 


equal to the so-called critical values described in a paper by Jack Mosh- 
man(15), Critical values corresponding to C, = 0 to 3.0 and for probabili- 
ties equal to 99, 95, 5, and 1 percent are listed in Table I of that paper on 
pp. 606-609. These values ire checked exactly againsi the author’s fre- 


quency factors. 
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equal to 0.364. It means that the two laws are identical at the special case 
that C, = 1.139 and C, = 0.364. It is understood that for every value of C, 
there is a corresponding value of Cy as indicated by the relation of Eq. 16 for 
which the log-probability law must be satisfied. With C, = 1.139 the log-proba- 
bility law holds true only for C, = 0.364 but not for other values of C,. This 
may also be demonstrated by the log-probability plottings in Fig. 2. These are 
plottings of the variate x/x against the probability for the given value of Cy = 
0.364 and various values of C,. The plotted values of x/ X were computed by 
Eq. 28, using K values obtained from Table 2 and Table 3 for different proba- 
bilities and values of C,. For example, with P = 99% and C, = 1.0, Table 2 
gives K = - 1.68. The value of oO, = x Cy = 0.364 xX. Then Eq. 28 gives x =X + 
(0.364x)(-1.68), or x/X = 0.388. *Other values used for the construction of 
curves in Fig. 2 were computed by the same procedure and are listed in Table 
4. It can be seen that of all curves plotted only the one with Cy = 1.139 isa 
straight line. It is apparent that this line follows both the log-probability law 
and the extreme-value law, while the other curves with values of C, different 
from 1.139 do not. 

Perhaps it is essential to compare the frequency factors of both laws from 
a purely theoretical viewpoint. Equation 28 indicates that x is a linear function 
of the frequency factor K as long as x and o, or C, are constants. In the ex- 
treme-value law the coefficient of skew is a constant equal to 1.139, and Eqs. 
35 and 36 show that the value of K is a function of T or P only and it is inde- 
pendent of C, and, of course, also of C,. In the log-probability law, the co- 
efficient of skew i is not a constant, and Eqs. 29 and 32 show that the value of K 
is not only a function of P, but also depends upon the value of o,, which in turn 
depends upon C, by Eq. 34 and on C, by Eq. 16. Therefore, each law has its 
own limitations. The extreme-value law is limited for a constant Cg, of 1.139 
and the log-probability law is limited by a pair of values of C, and Cy which 
should. satisfy Eq. 16. There is a possibility of eliminating these limitations 
and developing a method which is applicable to all values of Cy and C,. This 
possibility will be described in the subsequent articles. 


Flexibility of Curve- Fitting 


It is interesting to note that the straight line plotting in Fig. 2 divides all 
curves into two groups: one group which is concave upward and the other con- 
cave downward. It means that for the given value of Cy = 0.364, the log-proba- 
bility law furnishes a great number of conditions to fit curves which may be 
either straight or concave upward, or concave downward as it appears on a log- 
probability paper. The extreme-value law, however, offers only one possibility 
of fitting, that is the straight line for which C, = 1.139. The fact that the log- 
probability law gives more possibilities of fitting for a given value of C, may 
be seen in the procedure of deriving a graphic coefficient of skew as suggested 
by Hazen 39) This procedure may be outlined as follows: 


1) Compute values of x, Cy, and C, by the following fundamental statistical 
formulas: 


ix 


x = 
= = = 1) 


| 

| 

| 

| 
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x(x’) 
in which n is the total number of items, * and > 
should be noted that unlike the mean and the coefficient of variation the accu- 
racy of the computed value of C, depends much more greatly upon the number 
of items in the series used for computing it. If the computed C, is not satis- 
factory in producing a theoretical curve of fit, the best value of C, may be de- 
veloped by a graphic procedure described in the subsequent steps. 

2) With the computed value of C., find the frequency factors corresponding 
to various probabilities P from Table 1. Since the exact theoretical log-proba- 
bility frequency factors are developed in this paper, Table 2 should be used in- 
stead of Table 1. 

3) Compute values of x by Eq. 28 in which values of x and Cy are computed 
in step 1 and values of K for various probabilities are obtained in step 2. 

4) Plot values of x against the corresponding probabilities on a log-proba- 
bility paper. Connect the plotted points with a smooth curve which is the theo- 
retical probability curve. 

5) The observed data are plotted on the same probability paper for the sake 
of comparison with the theoretical curve. The probabilities of the observed 
data regyred for the plotting purpose are known as the plotting positions. 
Hazen'39) used the following formula for computing plotting positions of the 
observed data on the log-probability paper: 


20 


P 


in which n is the total number of statistical events and m is the rank of events 
arranged in an order of descending magnitude; for instance, m = 1 for the larg- 
est value and m = n for the smallest value. 

6) If the theoretical curve does not fit the observed data too satisfactorily, 
a lower or higher assumed value than the computed C, should be used to de- 
velop another theoretical curve. Figure 2 indicates that increasing the value 
of C, tends to change the curvature of the theoretical curve to become eventu- 
ally concave upward, and decreasing the value of C, will have the reverse ef- 
fect. Taking this fact as a guide, the new value of C, can be easily assumed. 

In fact, it is possible to make any log-probability plotting in the form of a 
straight line on a specially designed probability paper. The method is de- 
scribed as follows: 

Equation 28 indicates that the variate x is a linear function of K. When x 
is plotted against K, the plottings is a straight line. For the convenience of 
demonstration, Eq. 28 may be written as 


in which x/x is a dimensionless ratio and Cy = ox/x, By plotting values of 
x/ x against K, this equation represents a straight line. The slope of the line is 
equal to Cy. Figure 3 shows straight line plottings of this equation for various 
values of C,,. In this figure an auxiliary probability scale may be constructed 
to correlate the value of K with the probability for a given coefficient of skew. 
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Take, for instance, a value of C, = 1.139. The values of K and the correspond- 
ing probabilities can be obtained either by interpolation from values in Table 2 
or by computation as shown in Table 3. These values were used to prepare a 
curve showing a continuous relationship between K and the probability. The 
curve was then used to construct the auxiliary probability scale as shown in 
Fig. 3. 

It should be noted that of all straight lines shown in Fig. 3 only the one with 

a slope equal to 0.364 satisfies the log-probability law. It means that for a 
given value of C, = 1.139, the extreme-value law furnishes a large number of 
conditions to fit curves with straight lines; while the log-probability law offers 
only one possibility of straight line fitting for which Cy = 0.364. As mentioned 
previously, however, for all log-probability distributions a straight line plot- 
ting is always possible on a probability paper specially constructed for the 
particular value of C,. The well-known Gumbel paper developed by Powell (65) 
is one type of such probability paper specially designed for C, = 1.139. 

Therefore, for any value of Cg, there is an available set of log-probability 

frequency factors for various probabilities such as those given in Table 2. 
These frequency factors and the corresponding probabilities can be used to 
construct a special probability paper. On this probability paper, an infinite 
number of straight lines can be plotted and all of them can be represented by 
Eqs. 28 or 41. However, only one of these lines can satisfy the log-probability 
law. This is the line the slope of which must be equal to the value of C, related 
to the given value of C, by Eq. 16. Other lines do not follow theoretically the 
log-probability law but will offer an infinite number of chances for a best fit 

of the given data. This is the basic principle upon which a proposed method 

described in the following article is developed. 


A Proposed Method 


The comparison of the log-probability law with the extreme-value law de- 
scribed above leads to the evolution of a new method which might provide a 
satisfactory procedure of curve fitting to data of various kinds. This procedure 
simply consists of the following steps: 

1) Compute values of x, Cy, and C, respectively by Eqs. 37, 38, and 39. 

2) Determine the graphic coefficient of skew by the procedure described 
previously. 

3) Construct a probability paper for the graphic C,. An ordinary rectangu- 
lar cross-section paper may be used for this purpose and an auxiliary probabil- 
ity scale in conformity with the graphic C, may be constructed thereon by the 
method described in the previous article. 

4) Draw a straight line of fitting, passing through the mean ordinate where 
x =x. The probability for the mean ordinate may be obtained, by interpolation 
if necessary, from Table 2. 

5) The observed data may be plotted on the same probability paper for the 
sake of comparison with the theoretical curve. The formula used by Hazen, 

Eq. 40, for computing plotting positions is purely empirical and has the disad- 


vantage of exaggerating the recurrence interval for large events. Beard(46) 
has improved this formula by an arbitrary adjustment and a table was prepared 
by him for the estimation of plotting positions. However, the author suggests 
the use of the following simple and practical formula 


n 
n+] 


P = 


536-12 


— 


for plotting positions of annual maximum values and he derived it theoretically 
by the theorem of the mean number of exceedances(95), This formula produces 
results very close to those obtained by a method developed earlier by Gum- 
bel(66) which is based on the theory of the most probable largest value and re- 
quires the procedure of interpolation and the use of a table. Before being the- 
oretically verified the formula, Eq. 42, has been used empirically to replace 
the Gumbel procedure for plotting data on the Gumbel paper. 

In connection with the step 2 of the above outlined procedure, Hazen also 
suggested an alternate method 39) which uses an adjusted coefficient of skew 
instead of the graphic coefficient of skew for the construction of a theoretical 
log-probability curve. By this method the computed C, is multiplied by a cor- 
rection factor which is computed by 


o 


where n is the total number of items in the series of data. This formula is 

also shown at the bottom of Table 1. The computed C, multiplied by this cor- 
rection factor gives the so-called “adjusted coefficient of skew” which is used 
direcHy, for the construction of the theoretical probability curve. Later, Fos- 
ter found that the coefficient of skew depends upon both the number of 
items and the value of C, computed by Eq. 39, and he prepared a diagram show- 
ing the correction factor required for a given number of items and the com- 
puted value of C,. However, Foster has expressed’” to the author that he has 
never been too well satisfied with the correction formula, Eq. 43, nor with the 
diagram which attempted to give a more complete representation of the original 
computed correction factors. On this account he has started to compute a new 
set of factors based on the log-probability curves. It is hoped that the results 
of his computation will be made available soon. 

The method of graphic coefficient of skew requires a number of trial com- 
putations in order to arrive at a satisfactory value of C, and the theoretical 
probability curve of best fit. However, it is believed that this procedure can 
produce a curve of best fit to the points representing the data on a log-proba- 
bility paper and hence, it seems to be suited to quite general application and is 
useful with skew frequency curves from a great variety of sources. The meth- 
od of adjusted C, is direct and simple, without the need of trial computations, 
but it is not always satisfactory owing to the fact that the correction formula 
and diagram are not theoretically correct and exact. 

The method described above offers a straight line plotting of data of a vari- 
able coefficient of skew. The well-known Gumbel method of plotting is a spe- 
cial case of such plotting which is proposed for a constant value of C, = 1.139. 
It is expected that this method should produce better results to fit engineering 
data than either the Hazen method or the Gumbel method, as it has in reality 
a combination of the merits of both methods. Of course, the practical advan- 
tages of this method will depend further upon its application to various engi- 
neering data by interested persons. 


11. See pp. 70-71 of the reference(51), 
12. Ina letter dated February 14, 1954. 
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APPENDIX Il. NOMENCLATURE 
The following letter symbols, adopted for use in the paper, are listed for 
the guidance of discussers: 
coefficient of variation of x. 
coefficient of skew of x. 
base of natural logarithms. 
correction factor for the computed C,.. 


probability in percentage equal to or greater than the given variate 
of magnitude y. 


frequency function of y. 
index of variability. 
frequency factor of x. 
frequency factor of y. 
median of x. 


r-th moment for the logarithmico-normal distribution about the 
origin x = 0. 


median of y. 


rank of statistical events arranged in an order of descending mag- 
nitude. 


F(y) = @(x) = probability in percentage equal to or greater than 
the given variate of magnitude y or x where y = logex. 


order number of statistical variate, or number of causative fac- 
tors, or order of statistical moment. 


recurrence interval in yr. 
an independent statistical variate. 
the r-th statistical variate. 
magnitude of a statistical variate. 
= magnitude of the r-th statistical variate. 
value of x exceeded 15.87% of the time. 
mean value of x = a ° 
mean value of squares of x = E(x’) . 
mean value of cubes of x = Ze)" ° 
log 
log,x' at P = 15.87%. 


mean of y. 


pat 
(y roy y) 


skewness of x. 


\ 
f(y) = 
K = 
ay 
M = 
M, = 
x 
x 
Xp 
x’ 
y 
y’ 
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T= 


Euler’s constant = 0.5772157.. . 

y 
e 


= Standard deviation of x. 


= standard deviation of y. 


probability in percentage equal to or greater than the given 
variate of magnitude x. 


frequency function of x. 
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TABIE 2 - Theoretical Log-Probability Frequency Factors 


Probability in percentage 
Equal to or greater than the given variate 
99 9 80 50 20 5 1 OO. 0.01 


+ + + + 


+ 


2.33 165 0084 0.84 1.64 2.33 3.09 3572 
225 1662 0285 0.02 1067 2.40 3.22 3.95 
2018 1.59 0.85 0.04 0683 1.70 2.47 3.39 4018 
1656 0085 0.06 0.82 1.72 2.55 3.56 Lede 
2004 1653 0085 0.07 0681 1675 2662 3.72 4-70 
1.98 1249 028 0.09 0.80 1.77 2.70 3.88 4.96 
1691 1.46 0.85 0.10 0.79 1.79 2.77 4.05 5.24 
1.79 1.40 0.84 0.13 0.77 1.8 2.90 4.37 5.81 
1.74 1437 0284 0.76 1.84 2.97 4.55 6011 
1.68 1.34 0684 0615 6.75 1.85 3.03 4.72 6.40 
1.63 1.31 0.83 0.16 0.73 1.86 3.09 4.87 6.71 
1.58 1.29 0682 0.17 0.72 1687 3.15 5.04 7.02 
1.54 1-26 0.82 0.18 0.71 1.88 3,21 5.19 7.31 
1.49 1223 0.81 0.19 0.69 1.88 3.26 5.35 7.62 
1.45 1.21 0.81 0.20 0.68 1.89 3.31 5.51 7.92 
1.41 1.18 0.80 0.21 0.67 1.89 3.36 5.66 8.26 
1.38 1.16 0679 0.22 0.65 1.89 3.40 5.80 8.58 
1.34 1.14 0.78 0.22 0.64 1.89 3444 5.6% 8.88 
1.31 1.12 0.78 0.23 0.63 1.89 3.48 6.10 9.20 
128 1.10 0.77 0.24 0.61 1.89 3.52 6.25 9.51 
1.25 1.08 0.76 0.24 0.60 1.89 3.55 6.39 9.79 
1.22 1.06 0.76 0.25 0.59 1.89 3.59 6.51 10.12 
1.20 1.04 0.75 0.25 0.58 1.88 3.62 6.65 10.43 
1017 1.02 0.74 0.26 0.57 1.88 3465 6.77 10.72 
1.15 1.00 0.74 0.26 0.56 1.88 3.67 6.90 10.95 
1.12 0.99 0.73 0.26 0.55 1.87 3.70 7.02 11.25 
1.10 0.97 0.72 0.27 0.54 1.87 3.72 7.13 11.55 
1.08 0.9% 0472 0.27 0.53 1.86 3.74 7225 11.80 
1.06 0.95 0.71 0.27 0.52 1.86 3.76 7.36 12.0 
1.04 0.93 0.71 0.28 0.51 1.85 3.78 7.47 12.% 
1.01 0.90 0.69 0.28 0.49 1.84 3.81 7.65 12.8 
0.98 0.88 0.68 0.29 0.47 1.83 3.84 7-84 13.36 
0.95 0.86 0.67 0.29 0.46 1.81 3.87 8.90 13.83 
0692 0.84 0.66 0.29 0.44 1.80 3.89 8.16 14.23 
0290 0482 0.65 0629 0.42 1278 3.91 8.30 14.70 
0684 0.78 0.63 0.39 0.39 1.75 3.93 8.40 15.62 
00380 0.74 0662 0.30 0.37 1.71 3.9 8.86 16.45 
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| 
0 50.0 0 
4903 0.033 
0.2 48.7 0.067 
0.3 48.0 0.100 
04 47 03 0.136 
0.5 4667 0.166 
0.6 4601 0.197 
0.7 4505 0.230 
0.8 0.282 
0.9 0.292 
1.0 427 0.324 
11 432 0.351 
12 42.7 0.381 
1.3 0.409 
1.4 41 «7 0.436 
1.5 41.3 0.462 
1.6 40.8 0.490 
1.7 L404 0.517 
1.8 40.0 0.544 
1.9 39.6 0.570 
2 0 39.2 0.5% 
2.1 38 8 0.620 is 
2.3 38.1 0 
204 37 0.691 
2.5 37 oh 0.713 
26 371 0.734 
2.7 36.8 0.755 
2.8 36.6 0.776 
29 36.3 0.796 
3.0 36.0 0.818 
24 3501 0.895 
38 3402 0.966 
+ 49 3329 1.900 
339 1.981 
529 32 1.155 


TABLE 3 - Frequency Factors for C, = 1.139 


Probability P in percentage 
law of Equal to or greater than the given variate 
Distribution 9868 © 1 GA OM 


_ = - + + + + + 


Extreme-Value Law 1.64 1.31 0692 0.16 0.72 1687 3.14 4-94 6473 
log-Probability law 1.61 1.30 0.33 0.17 0.73 1.87 3.12 4.94 6.82 


TABLE 4 = Values of x/ x for the Construction of Curves in Fig.2 


Values Probability P in percentage 
of GC, Equal to or greater than the given variate 
99 95 80 50 20 5 1 O. 


0.0 O15 0640 0.69 1.690 1.31 1.60 1.85 2.13 
0.5 028 0646 0669 0697 1.29 1.65 1.99 2.41 
1.9 0039 O65. 0669 0.95 1.27 1.67 2.10 2.72 
1.139 0653 0670 0694 1.27 1.68 2.14 2.30 
1.5 Oo47 0656 0671 0.93 14625 1.69 2.20 3,90 
2.9 053 0660 0672 0.91 1.22 1.59 2.28 3.28 
3.9 0.62 0666 0674 0690 1619 1.67 2.38 3.72 
5.0 0.72 0673 O677 0689 1613 1.62 2.44 4.22 
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Fig. 1. Characteristic Values of the Log-Probability Law. 
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Fig. 2. Log-Probability Plottings for Cy = 0.364. 
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Fig. 3. Straight Line Plottings for Cg = 1.139. 
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